Services for Data Access and Data Processing on Grids
نویسنده
چکیده
An increasing number of grid applications manage data at very large scale, of both size and distribution. In this paper we discuss data access and data processing services for such applications, in the context of a grid. The complexity of data management on a grid arises from the scale, dynamism, autonomy, and distribution of data sources. The main argument of this paper is that these complexities should be made transparent to grid applications, through a layer of virtualization services. We start by discussing the various dimensions of transparent data access and processing, and illustrate their benefits in the context of a specific application. We then present a layer of grid data virtualization services that provide such transparency and enable ease of data access and processing. These services support federated access to distributed data, dynamic discovery of data sources by content, dynamic migration of data for workload balancing, parallel data processing, and collaboration. We describe both our long-term vision for these services and a concrete proposal for what is achievable in the near term. We also discuss some support that grid data sources can provide to enable efficient virtualization. GFD-I.14 Vijayshankar Raman, IBM Almaden Research Center DAIS Working Group Inderpal Narang, IBM Almaden Research Center Chris Crone, IBM Silicon Valley Lab Laura Haas, IBM Silicon Valley Lab Susan Malaika, IBM Silicon Valley Lab Tina Mukai, IBM Silicon Valley Lab Dan Wolfson, IBM Silicon Valley Lab Chaitan Baru, San Diego Supercomputer Center
منابع مشابه
Improving Mobile Grid Performance Using Fuzzy Job Replica Count Determiner
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common computational platform. Mobile Computing is a Generic word that introduces using of movable, handheld devices with wireless communication, for processing data. Mobile Computing focused on providing access to data, information, services and communications anywhere an...
متن کاملImproving Mobile Grid Performance Using Fuzzy Job Replica Count Determiner
Grid computing is a term referring to the combination of computer resources from multiple administrative domains to reach a common computational platform. Mobile Computing is a Generic word that introduces using of movable, handheld devices with wireless communication, for processing data. Mobile Computing focused on providing access to data, information, services and communications anywhere an...
متن کاملImproving Data Grids Performance by Using Modified Dynamic Hierarchical Replication Strategy
Abstract: A Data Grid connects a collection of geographically distributed computational and storage resources that enables users to share data and other resources. Data replication, a technique much discussed by Data Grid researchers in recent years creates multiple copies of file and places them in various locations to shorten file access times. In this paper, a dynamic data replication strate...
متن کاملA Federated Grid Environment with Replication Services
Grids can be classified as computational grids, access grids and data grids. Computational grids address applications that deal with complex and time intensive computational problems, usually on relatively small datasets. Access grids focus on group-to-group communication. Whereas data grids address the needs of applications that deal with the evaluation and mining of large amounts of data in t...
متن کاملE2DR: Energy Efficient Data Replication in Data Grid
Abstract— Data grids are an important branch of gird computing which provide mechanisms for the management of large volumes of distributed data. Energy efficiency has recently emerged as a hot topic in large distributed systems. The development of computing systems is traditionally focused on performance improvements driven by the demand of client's applications in scientific and business domai...
متن کامل